Assembly 


Tutor 


ibe  - 
| 


“wy 


ees tere 
Lae Ae, 
Ni em a) Ny 
* od 


The Assembly Tutor 


WELCOME! 


Welcome to The Assembly Tutor—the BASIC programmer’s 
guide to learning assembly language. Unlike many 
beginning books on assembler that emphasize games or 
sounds, this discussion will concentrate on developing 
practical routines that do useful things when called from 
BASIC. We’ll look at how to create assembly language 
programs, call them by name or memory location, and 
how to pass variables back and forth to BASIC. 


This tutorial assumes that you already understand basic 
programming concepts such as variables, arrays, and 
subroutines. As we proceed, most of the examples will 
provide parallels to BASIC where possible. And please 
remember one important point—there is nothing 
inherently difficult about learning assembly language. 
Attitude is everything, and if you can think of 
assembler as a "stripped-down" version of BASIC, you’ll 
be successful that much sooner. 


: —— 


Throughout this discussion we will be referring to the 
8088 microprocessor that is used in the IBM PC. Please 
note that everything said about the 8088 also applies 
to the 8086, the 80286, the 80386, and the NEC "V" 
series found in some PC compatibles. 


The Assembly Tutor 


AS EASY AS BASIC 


Assembly language follows the same general form as a 
BASIC program. That is, commands are performed in 
sequence until a GOTO or GOSUB is encountered. (These 
are called Jump and Call respectively.) Many BASIC 
instructions have a direct assembler equivalent, though 
the syntax will be slightly different. One important 
difference, however, is that the 8088 microprocessor 
can operate only on integer numbers. Another is that 
for the most efficiency, you will be limited to a few 
working variables. Let’s begin by looking at some 
assembly language commands, and see how they are 
analogous to those of BASIC. 


Consider the following BASIC program fragment: 

AK = 5 
Here, we are assigning the value 5 to variable AX. The 
8088 has several built-in variables, one of which is 
conveniently called AX. To "move" the value 5 into 


variable AX we’ll use the Moy instruction: 


Mov AX, 5 


ASSEMBLER 


Page 2 


The Assembly Tutor 


SSS 


In assembly language programming, the destination 
variable is always on the left, while the source is on 
the right as in BASIC. Not too bad so far, agreed? 
Now let’s consider addition and subtraction. To add 
the amount 12 to AX in BASIC you would do this: 


Pew wa 


AX = AX + 12 
The equivalent 8088 command is 


Add AX, 12 


1 
| 
Again, the variable on the left is always the one that 
receives the results of any adding, moving, and so on. | 
Subtraction is very similar: | 

AX = AX - 100 | 
vs. | 


Sub AX, 100 


to use (not counting all of the PC’s memory). They are 
AX, BX, CX, DX, SI, and DI. Funny names maybe, but 
they do just about all of the work. By the way, in 
assembler talk these variables are called registers. 


There are 6 general purpose variables available for you | 
| 


Each is capable of doing the most common functions like 
adding and subtracting, though some are specialized for 
certain other operations. We’ll get to that momentarily. 


The Assembly Tutor 


Comparing and branching in assembly language is also 
quite similar to BASIC. But instead of saying 


AX = AX + 2 
IF AX > 60 GOTO Finished 


you’d do it in assembler this way: 


Add AX, 2 
Cmp AX, 60 
Ja Finished 


The Assembly Tutor 


We’re telling the 8088 to Add 2 to AX, then Compare AX 
to 60, and finally to Jump if Above to the code at label 
Finished. There are several kinds of conditional jump 
instructions in assembly language, often following a 
compare as shown above. In fact, all you can really do 
after a compare is jump somewhere based on the results. 
And while there is no direct equivalent for 


IF AX = 10 Then BX = BX - 1 


if we change the strategy to 


aa < 


IF AX <> 10 GOTO Not10 
BX = BX - 1 
Not10: (program continues) 


then a direct translation is simple: 


Cmp AX, 10 

Jne Not10 

Dec BX 

Not10: (program continues) 


Jne stands for Jump if Not Equal. Also, notice the 
command Dec, which means decrement by |. This is 
one case in which an assembler instruction is actually 
more to the point than its BASIC counterpart, and is 
equivalent to the BASIC command 


i i re 


BX = BX - 1 


While Sub BX, J would work just as well, using Dec is 
faster, and we all know that speed is the name of the 
game. 


The complement to Dec is Inc, short for increment by 
one. You can use Inc and Dec with any of the 8088's 
registers, as well as on the contents of any memory 
location, which brings up an important issue. | 


The Assembly Tutor 


The Assembly Tutor 


ane sides SSOP te 
OAS A AA 


SS 
oo 


| Far oF me te Sai 
a a a: if 
oe Gre ene 


SSS SS 
Le Se 


f.e.eaie OLS Sys 
fama SS 


Neat SENS 
—A KD DB Ey 
RN RS x 


Page 7 


| The Assembly Tutor 


MEMORY VARIABLES 


At some point, many programs will require more 
variables than can be held within the CPU’s registers. 
All of the available free memory in a PC can be used as 
variable storage, with only a few limitations: 


1) You must first tell the assembler how much space 
to set aside (much like dimensioning an array), 
though assemblers are pretty friendly and allow 

you to use names for the memory locations. In 
fact, you don’t even need to know the actual 
memory addresses—the assembler does that for 
you as well! 


2) Adding, subtracting, incrementing, and 
decrementing are all much faster when done 
within registers. 


3) Some operations can be done only using 
| registers. If you want to, say, multiply the 
memory variable Counter by 12, you’ll first have 
to move it into AX, do the multiplication, and 
then move it back again. As I said earlier, it’s 
not as complicated as many think, just a bit 
tedious sometimes. 


While we’re on the topic of variables, there’s another 
handy place to save things, especially when you just 
need the storage temporarily. The stack is an area 
of memory set aside specifically for this purpose. In 
some ways it is like the scratch memory on a four- 
function calculator, and is often used for storing 
intermediate results. It’s also the most common place 
for passing variables between programs, because neither 
| program has to know exactly where in memory the stack 
is located. 


Page 8 


The Assembly Tutor 


ee 
Ss 


You’ll notice that so far, not one mention has been 
made of absolute memory addresses. For the most part, 
assembly language doesn’t require you to pay attention 
at that level—especially for the kinds of things we’re 
doing here. The only exceptions might be when writing 
directly to the display screen, or when looking at low 


memory, for example to see whether the Caps Lock key 
is active. 


EEE 


REGISTERS 


Now let’s get back to the registers, and how they 
differ. As mentioned earlier, all are capable of 
adding, subtracting, incrementing, and decrementing, 
but each has its own specialty as well. For example, 
the AX register is the only one that can be multiplied 
or divided. The "A" in AX stands for Accumulator. 


The "B" in BX is for Base, and this register is 
frequently used to hold the base address of a 
collection of variables or other data. If you had a 
string of text in memory to be printed, you’d probably 
have the assembler put the address of the first 
character in BX. The rest of the string could then be 
found by referencing BX. 


The Assembly Tutor 


The "C" in CX stands for Count, since CX is most 
often used as the counter in an assembler "For/Next" 
loop. In fact, the assembly language command Loop 
uses CX automatically to perform an operation a 
specified number of times. The comparison below 
illustrates this. 


BASIC: 
FOR CX = 1 105 
GOSUB Beep. Tone 


NEXT 


Assembler: 


Mov CX, 5 
Do: Call Beep_Tone 
Loop Do 


The Assembly Tutor 


The Loop instruction automatically branches to the label 
"Do" CX times. This is faster and more efficient than 


Mov CX, 5 
Do: Call Beep_Tone 

Dec CX 

Cmp CX, 0 

Jne Do 


The DX register is a general purpose Data register, and 
is named accordingly. 


The last two "normal" registers are SI and DI. SI stands 
for Source Index, while DI means Destination Index. 
It is not hard to guess that these are well suited for 
copying data from one memory location to another. 


The 8088 has a rich set of instructions for moving and 
comparing strings, using SI and DI to show where they 
are. But again, SI and DI are still ordinary registers 
too, and can be used for common chores as well. In 
many situations it really doesn’t matter whether you 
use BX or DI or SI or AX. 


Page 11 


The Assembly Tutor 


CALLING ALL INTERRUPTS 


The last item we’ll consider before moving on to the 
examples is interrupts. Suppose you have an assembly 
language program that needs to make a call to a DOS 
subroutine, for example to retrieve the system date or 
time. Calling one of our own routines is no problem, 
l since the assembler does all the work of calculating 

Yi the memory addresses. 


But how can we know where a DOS program is located? 
. The truth is we can’t. In fact, if all DOS routines 

were always in the same place in memory, it would be 
impossible for the nice folks at Microsoft to ever make 
changes or improvements. 


Instead, DOS (and BIOS) routines are accessed using a 
method called interrupts. When DOS is booted, one of 
the first things it does is to place a list of its own 
internal addresses in a known place—at the beginning 
of memory. The very first four bytes in your PC’s 
memory contain the address for the first interrupt 
routine, the next four contain the second, and so 
forth. Therefore, it is not difficult to find where a 
given DOS or BIOS sub-program is located. 


: see's a! i 


The Assembly Tutor 


Ong 


But the 8088 makes it even easier for us than that. To 
access the fifth interrupt routine, for example, merely 
use the instruction /nt 5, and the PC’s processor will 
calculate and call the location for you. When the 
routine is finished, it ends with a Return instruction 
(like BASIC), and the processor resumes where it left 
off in your program. 


This is a very powerful concept, since it allows 
programs that would otherwise not have knowledge of 
each other to communicate effectively. This also makes 
programs like print spoolers, SideKick, and SuperKey 
possible. Since printing is controlled by interrupts, 

a spooler program can easily look at low memory to find 
where the printer routine is located, and put its own 
address there instead. Then, whenever a program calls 
that interrupt, it is unwittingly routed to the spooler. 
The possibilities are endless, and it should be clear why 
the PC’s design was such a great advance over its Apple 
and Commodore predecessors. 


Page 13 


The Assembly Tutor 


All of the DOS and BIOS services are accessed by using 
interrupts, though there are so many different services 
that we frequently must pass additional information to 
an interrupt routine. For example, most of the DOS 
services are accessed through interrupt 21. But since 
there are dozens of Int 21 services, we must specify 

the one we want by placing a service number in one 

of the registers. AH—the High portion of AX—is 
always used for this purpose. 


CATO 
NA E 
PRUE 


Auennaran 
Bey is iit 


There are two specialized registers, called BP and SP. 
BP (Base Pointer) is another Base register like BX, 

only it is intended for use with the stack. When you 
need to get at the data on the stack, using BP is the 
way to do it. The Stack Pointer (SP) holds the current 
address of the stack, and should never be altered 

unless you have a very good reason to do so. Since 
BASIC and DOS also use the stack, messing with SP is a 
sure-fire way to send your PC into oblivion! 


The last four registers are the segment registers, but 
we'll mention them only briefly right now. As you 
undoubtedly know, the 8088 uses a segmented 
architecture, which means that even though it can 
utilize a megabyte of memory, it can do so only in 64K 
chunks at a time. CS holds the current Code Segment 
(where the program goes), DS holds the Data Segment 
(your memory variables), SS holds the Stack Segment, 
and ES is an Extra Segment for you to play around with. 


SS STN ne a etal ppt 


) Page 14 


The Assembly Tutor 


— RZ 


\$rOUl #101) 
Segmented 
Register 


Each of the 8088 registers can hold two bytes (one 

word), allowing you to store any integer number between 

0 and 65535. (This range of values may also be considered 
-32768 to 32767). But AX, BX, CX, and DX may also be used 
as two separate one byte registers (0 to 255 or -128 to 127). 
One byte is often sufficient, for example when manipulating | 
ASCII characters, and this effectively adds four more registers. 


Remember, the more variables that can be kept within 
registers, the faster and more efficient a program will be. 


Page 15 


{ 
| 
’ 
| 
i 
| 
| 


The Assembly Tutor 


When using the registers separately, the two halves are 
identified by the letters "H" and "L"—for High and 
Low. That is, the high portion of AX is referred to as 
AH, while the low portion of DX is called DL. This 
would be represented with BASIC variables as 


AX = AL + 256 * AH 


or with bit patterns as 


AX 
us \ 
1011 0110 0111 0101 
\ ia if 
AH AL 


Notice that SI, DI, BP, and SP cannot be split this 
way, nor can the segment registers CS, DS, SS, and ES. 


Page 16 


The Assembly Tutor 
ee 


There are also hardware interrupts, where an external 
device such as a modem or the keyboard can signal the 
PC to stop what it’s doing, and turn its attention 
elsewhere for a moment. In fact, this is the only true 
form of interrupt, since software interrupts don’t 
actually interrupt anything, and are really just a 
specialized type of Call. Hardware interrupts follow 
the same calling convention, using the table of 
addresses in low memory. 


Now let’s look at a few example programs. The smallest 
complete QuickPak program has to be PrtSc, which 
performs a screen dump from within BASIC. 


Int) 5 s;call interrupt 5 in BIOS 
Ret ;return to BASIC 


Not much to it, is there? The first instruction calls 
Interrupt 5 (a routine already built into the PC’s BIOS). 
The second instruction means Return, and is exactly the 
same as the RETURN statement in BASIC. If you compare 
this to the assembly language source on the QuickPak 

disk, however, you'll notice that a certain amount of 
overhead is required simply to accommodate the assembler. 
We’ll get to that later. 


Just like the BIOS routine to print the screen, there 
are many other powerful services within the BIOS and 
DOS that we can utilize. In fact, DOS is almost a 
high-level language in its own right, having functions 
equivalent to BASIC’s PRINT, INKEY$, OPEN, and so 
forth. But we’ll get to those later too. 


The Assembly Tutor 


SPAGHETTI CODE? 


To write a routine that converts lower case letters to 
capitals in BASIC, you might use something like this: 


IF ALS => "a" AND ALS <= "z" THEN 
AL$ = CHRSC(ASC(AL$) - 32) 
END IF 


Of course, in assembly language each compare must be 
done separately, followed by a jump based on the 
results. Let’s re-phrase our BASIC example slightly: 


IF ALS < "a" GOTO Done 
IF AL$ > "z" GOTO Done 
ALS = CHRS(ASC(AL$) - 32) 
Done: 

(program continues) 


Now a conversion to assembler is easy: 


Cmp AL, "a" ;compare AL to "a" 


Jb Done ;Jump if below to Done 
Cmp AL, "z" ;compare AL to "z" 

Ja Done ;Jump if above to Done 
Sub AL, 32 subtract 32 from AL 
Done: 


(program continues) 


Notice how the assembler allows the use of quoted 
constants. When it sees a character or string within 
quotes, it knows we mean to use the ASCII value. Also 
notice how much jumping around is necessary to 
accomplish even the simplest of actions. 


Page 18 


The Assembly Tutor 


As we saw before, it gets a bit tedious, though the 
logic is not really any different from that used in 
BASIC. Spaghetti code for sure, but this is reality 
folks. In fact, ’m frequently amused by programmers 
who argue so strongly against all use of the GOTO. 
While nobody could seriously object to a well organized 
and structured programming style, if it weren’t for 
jumps and branches in assembly language, we’d all be 
out of business! 


We have already seen how to call the BIOS routine to 
perform a screen dump, so let’s continue and see how to 
call some of the other useful routines in the BIOS and 
DOS. For example, to determine the current default 
drive, DOS provides service 19h. By the way, whenever 
you see an "h" after a number, that means Hex. 19h 

is the assembler equivalent to &H19 in BASIC. If you 
specify a number without an "h", it is assumed to be 
decimal. Notice that upper/lower case does not matter. 


i A nema 


a 


a 


i NTN I ee nT, 


Page 19 5 


The Assembly Tutor 


All of the DOS and BIOS services are specified by 

placing the service number in the AH register, and then 
calling the interrupt routine. Some services expect additional 
information in other registers as well. Thus to find the current 
drive, merely put the value 19h into AH, and ask for interrupt 
2th like this: 


Mov AH, 19h 
Int 21h 


DOS reports the current drive by returning a number 
in AL, where 0 = drive A, | = drive B, etc. Our final 
task will then be to pass the value in AL back to 
BASIC. As mentioned earlier, this is done by using 
the stack. Of course BASIC can’t access the stack 
directly, but it can tell our assembler routine the 
location of any variables. 


THE STACK 


So far, we’ve successfully avoided discussing the stack 
in any real detail. I’m afraid the time has now come, 
but I promise, this is the worst part and once you get 

past it, the rest is all down hill. 


The primary purpose of the stack is to retain the 
return address of a program when a subroutine is 
called. This is true not only for assembly language, 
but for BASIC as well. For example, when you tell 
BASIC to GOSUB 1200, BASIC needs to remember the 
location in memory of the next command to execute 
when the routine returns. 


i 
’ 
} 
| 
| 
| 
| 
: 


The Assembly Tutor 


SS 


It does this by placing the address of the next 
instruction on the stack before it jumps to the 
subroutine. Then when the Return instruction is 
encountered, the address to return to is available. 
The 8088 understands Calls and Returns directly, and 
will place and restore the addresses in the stack 
automatically. 


The stack is not unlike a stack of books on a table, 
but one of its great advantages is that you don’t need 
to know where in memory it is actually located. Items 
can be placed onto the stack either manually with the 
Push instruction, or automatically by the 8088 
processor as part of its handling of Calls. Values are 
retrieved from the stack with the Pop command, 
among other methods. 


Let’s say you have a BASIC subroutine that does 
something to the variable X. The code might be: 


X = 12 
GOSUB 2000 
PRINT X 


Page 21 


The Assembly Tutor 


In assembly language you could first Push the value 
onto the stack, and then call the subroutine. The 
subroutine, expecting the value there would retrieve 

it, and do its work, based on that value. Remember, 

the whole point is that a// programs can get at the 

stack, and they don’t need to know where it is in . 
memory. Therefore, variables are passed between BASIC 
and assembly language routines by placing the addresses 
\ on the stack. 


If BASIC could get at the registers directly, it could 
pass variables through them, as we saw when telling 
DOS which of its services to do. But it can’t, and 
moreover, with a limited number of registers, only a 
few variables or addresses could be accommodated. 


can’t get to the stack directly? I’m glad you asked. 


| 
| 
| 
| So how does this relate to passing variables, if BASIC 
\ 
| 
| 
| PASSING VARIABLES 

j 

Whenever you use the BASIC CALL instruction with a 
variable name, BASIC first pushes the address of that 
variable onto the stack, before jumping to the code 
being called. If more than one variable is specified, 
then all of the addresses are pushed. As an example, 
when the QuickPak GetDrive command is called as 
shown below, BASIC pushes the address of Drive% onto 
the stack. With access to this address, GetDrive can 
then set it to a new value, to reflect which drive is 

' the default. 


CALL GetDrive(Drive%) 


The Assembly Tutor 


When the GetDrive routine begins, it knows that the 
stack will be holding the address of Drive%, as well as 
the address to return to in BASIC. (Though it only 
cares about Drive%.) It also knows that it can find the 
data on the stack with the SP (Stack Pointer) register. 
The stack now looks like this: 


A 

1 ow 
Address of Drive% (2 bytes) <-- SP + 4 points here é a 
BASIC's code segment (2 bytes) ie 

oD 
BASIC's return address (2 bytes) <-- SP holds this address 23 


Notice that while we can get at the address of Drive% 
through SP, an extra step is still required to get at 

the Data in Drive%. Let’s digress for a moment to 
consider the difference between memory addresses and 
values. The assembler command 


Mov AX, 12 
puts the value 12 into register AX. But suppose we 
want to put the contents of memory location 12 into AX. 
We indicate this to the assembler by using brackets: 

Mov AX, [12] 


or 


Mov BX, 12 
Mov AX, [BX] 


This is an important distinction, and is similar to 
BASIC’s PEEK and POKE commands as shown below. 


BASIC Assembler 
BP = SP Mov BP, SP 
SI = Peek(BP+8) Mov SI, [BP+8] 
SI = 12 Mov SI, 12 
Poke SI, 12 Mov [SI], 12 


Se i SN 


Page 23 
a 


The Assembly Tutor 


. Although we can easily find the address of Drive% by 
. looking at SP, an extra step is required to get at the actual 
data. The example below shows how to do this. 
Since we don’t want to mess with SP, let’s first put SP into BP. 
Then we can find where Drive% is located, and put the 
information DOS gives us there. 


i Mov BP, SP ;put the stack pointer into BP 
Mov SI, [BP+04] ;put the address of Drive% into SI 
I Mov AH, 19h ;service to get the default drive 
| Int 21h ;call DOS to do it 

Mov [SI], AL ;put the answer into Drive% 


Notice how brackets are used to indicate the addresses. We 
must first retrieve the address of Drive%’s address (whew!), 
before we can put the value in AL there. 


This is called indirect addressing, since we’re using a 
register to hold the address of the data. Notice also 
how the 8088 accepts addition "on the fly" when we tell 
it BP+04 But again, this is about as tough as it gets. 

In fact, finding variable addresses is often more of a 
pain than writing the actual routines! 


"Now, was that BP + 6, or BP + 4? " 


Page 24 


The Assembly Tutor 


The complete working GetDrive program has two small 
added complications, because not only can we not mess 
around with SP, but BASIC would prefer we didn’t change 
BP either. The obvious solution is to first save BP onto 
the stack before changing it, and then restore BP later 
before returning to BASIC. 


The other complication is caused by the very fact that 
BASIC put extra information (Drive%’s address) onto 
the stack. But neither is insurmountable: 


Push BP ;save BP before changing it 

Mov BP, SP ;put the stack pointer into BP 

Mov SI, [BP+6] ;put the address of Drive% into SI 
7(now 6 because we also pushed BP) 


Mov AH, 19h stell DOS we want default drive 
Int 21h s;call DOS to do it 

Mov [SI], AL ;put the answer into Drive’ 

Pop BP ;restore BP to its original value 
Ret 2 ;the 2 means discard the 2 address 


sbytes of Drive% 


Normally when a Return command is encountered, the 8088 
pops the last four bytes from the stack automatically, 

and returns to the segment and address contained in those bytes. 
But that would leave the 2-byte address of Drive% still 
cluttering up the stack. Therefore, for each variable 

that is passed with a CALL from BASIC, you must add 2 

to the Return instruction in your assembler routine. 

Had two variables been passed, we would have used 

Ret 4 instead. 


With the introduction of MASM version 5.0, Microsoft 
has made the task of interfacing assembly language to 
BASIC quite a bit easier. Two major areas they have 
simplified are dealing with the stack and getting at 
variables, and defining a routine’s segments. We’ll 
get to these in just a moment. 


The Assembly Tutor 


MULTIPLICATION 


Now that we’ve gotten past the admittedly messy details 
of variable passing and the stack, let’s continue on. 
Another fairly simple program is DosVer, which 
retrieves the current DOS version. Like most of the 
QuickPak programs, DosVer relies on the existing DOS 
services to do its dirty work. In this program we’ll 

be pushing and popping values on the stack, as well as 
performing multiplication. 


The syntax for calling DosVer with QuickBASIC 3 is 

CALL DosVer(V%), where V% returns with the version 
number times 100. That is, if the PC was booted with 

DOS 3.10, then V% will hold the value 310. Dealing 

with floating point numbers is much more difficult 

than with integers, and doing it this way is not that 

much of a compromise. (The QuickBASIC 4.0 version of 
DosVer in QuickPak Professional is designed differently, and 
returns the value as a function. This will be covered shortly.) 


The DOS service to get the version number returns with 
two separate values—the "major" version number (3 in 
this case) and the "minor" number (10). These values 
are returned in AL and AH respectively. Our strategy 
will be to multiply AL by 100, and then add AH to the 
result. 


Page 26 


The Assembly Tutor 


Unfortunately, when we use AL for multiplication, the 
value 100 must be in a register—we can’t just say 


Mul AL, 100 
though it would sure be nice if we could. 
Also, whenever AL is multiplied, the result uses the 
entire AX register which destroys any value that is in 


AH. Therefore, we'll also be using BX when we add 
the two together. Ready? Let’s do it. 


Push BP ssave BP 

Mov BP, SP ;put address of the stack into BP 
Mov SI, [BP+06] ;put address of V% into SI for later 
Mov AH, 30h ;service 30h gets the version 

Int 2th scall DOS to do it 

Push AX ;save a copy of the version for later 
Mov CL, 100 ;get ready to multiply AL by 100 

Mul CL ;multiply AL * CL (AX is now 300) 
Pop BX ;retrieve version, but in BX 

Mov BL, BH ;put the minor part in BL for adding 
Mov BH, 0 ;clear BH, we don't want it anymore 
Add AX, BX z;add the numbers 

Mov [SI], AX s;move the result into V4 

Pop BP ;restore BP 

Ret 2 sreturn to BASIC discarding V% 


| Notice the extra switch we had to do with BH and BL. 


AX was pushed onto the stack to save the minor version 
while AL was multiplied. When it was popped as BX, the 
minor part was in BH. But we can’t add registers that 
are different sizes (AX and BH). Further, any number 
in the high half of a register is by definition 256 

times the value of the same number in a low half. So 

we had to move BH into BL, clear BH, and finally add 
AX to BX. 


Page 27 


The Assembly Tutor 


One other point worth mentioning here is the way BH is 
cleared to zero in the program above. Even though Mov 
BH,0 moves a zero value into BH, this is not the most 
efficient way to clear a register. Any time a numeric 
or string value is specified in a program (0 in this 
case), that much extra space is need to hold the value. 
Further, operations on numbers or memory variables 
are slower than operations that access only registers. 


The preferred method for clearing a register is to use the Xor 
instruction, as shown below. 


Xor AX,AX 
is the same as 


Mov AX,0 


When an Xor is performed on any two values, only those 
bits that are different will end up being set. And 

since the same register is used for both operands, all 

of the result bits will be zero. The code for using 

Xor is definitely less obvious, however the instruction 

is both smaller and faster. 


STRINGS ’N’ THINGS 


The next important topic that must be covered is 
manipulating string variables. If you’ve studied your 
BASIC manual carefully, you must be aware that strings 
are stored very differently from regular numeric 
variables. BASIC allows us to find the address of a 
variable with the VARPTR function. For integer or 
floating point numbers, this value is the address of 

the actual data. But for strings, VARPTR instead gives 
us the address of a string descriptor. 


The Assembly Tutor 


SSS 


A string descriptor is a table containing information 
about the string—that is, its length and address. But 
finding where a string lies in memory is a lot like a 
children’s game. First you find a note on the 
refrigerator. The note says, "Look in the closet." 
Then in the closet is another note that says, "Check 
the basement." By the time you’ve finally found the 
string, you'll feel like Sherlock Holmes! 


In QuickBASIC, Turbo Basic, and the IBM/Microsoft 
BASCOM compilers, a string descriptor is comprised of 
two words (four bytes) of information. The first word 
contains the length of the string, while the second 
contains the address of the first character. Consider 
the following BASIC instructions: 


X$ = "QuickPak" 
x VARPTR(X$) 


Page 29 


a 


a 


The Assembly Tutor 


' 


X will now be holding the address of the four-byte 
descriptor for X$. For the sake of argument, let’s say 
that X is now 2345. Addresses 2345 and 2346 will 
together contain the length of X$ which is 8, and 
addresses 2347 and 2348 will contain yet another 
address—that of the first character in X$. We can 
find the length of X$ by using the formula 


Length = PEEK(X) + 256 * PEEK(X + 1) 
And the first character "Q" can be found with 
Addr = PEEK(X + 2) + 256 * PEEK(X + 3) 
We could then print the string on the screen like this: 


For C = Addr TO Addr + 7 
PRINT CHRS(PEEK(C)); 
NEXT 


Therefore, this is a BASIC model for how strings are 
located by an assembly language program. When we 
call an assembler routine with a string argument, BASIC 
will first push the address of the descriptor onto the 
stack, before calling the routine. A good example to 
use now is Upper, which capitalizes the characters 

in a String. 


The strategy this time will be to first get the 
descriptor address from the stack. Then we’ll put the 
length into BX and the address of the string data into 
SI. We’ll step through the string starting at the end, 
decrementing BX by one for each character. When BX 
crosses zero, we’re done. 


| Page 30 


The Assembly Tutor 


The BASIC equivalent looks line this: 


SI = VARPTR(X$) ‘address of X$ descriptor 
BX = PEEK(SI) + 256*PEEK(SI+1) 'put LEN(X$) into BX 
SI = PEEK(SI1+2)+256*PEEK(SI+3) ‘address of 1st char in SI 
Next: 
BX = BX - 1 ‘point to previous char in Xx$ 
IF BX < 0 GOTO Exit ‘no more, exit 
AL = PEEK(BX + SI) ‘get character 
IF AL < ASC("a") GOTO Next ‘skip conversion 
IF AL > ASC("z") GOTO Next ‘ditto 
AL = AL - 32 ‘convert to upper case 
POKE BX + SI, AL ‘put char back into X$ 
GOTO Next ‘and go do it again 
Exit: 
PRINT X$ ‘show that it worked 


The complete working assembly language program is 
shown below. 


Push BP 
Mov BP, SP 
Mov SI, [BP+6] 
Mov BX, [SI] 
Mov SI, [SI+2] 
Next: 
Dec BX 
Js VExKdit 
Mov AL, [BX+SI] 
Cmp AL, "a" 
Jb Next 
Cmp AL, git 
Ja Next 
Sub AL, 32 
Mov [BX+SI], AL 
Jmp Next 
Exit: 
Pop BP 
Ret 2 


:save BP before changing it 

;put address of stack into BP 

;put address of X$ descriptor into SI 
s;put LEN(X$) into BX 

sput address of 1st character into SI 


s;point to next prior character 

:if "sign" is negative, BX less than 0 
;put the character into AL 

scompare it to ASC("a"') 

; jump if below to Next 

s;compare AL to ASC("z") 

; jump if above to Next 

;convert AL to upper case 

;put AL back into X$ 

; jump to label Next 


srestore BP for BASIC 
z;return to BASIC discarding VARPTR(X$) 


Page 31 


The Assembly Tutor 


WHAT’S YOUR SIGN? 


Notice the use of a new form of conditional jump—JS 
—which means Jump on Sign. Here we’re testing the 
"sign" of the number in BX, and jumping if it is 
negative. Though it hasn’t been mentioned before, a 
conditional jump doesn’t always have to follow a 
compare. While a compare will set the "flags" in the 
8088 that indicate whether a particular condition is 
true, so will several other instructions. Some of 

these are Add, Sub, Dec, and Inc, though not Mov. 


Page 32 / 


The Assembly Tutor 


So instead of having to include an explicit comparison: 


Dec BX ;decrement BP 
Cmp BX, 0 ;compare it to zero 
Jl Next ;jump if less to Next 


All that is really needed is 


Dec BX 
Js Next 


since the decrement instruction sets the Sign Flag 
automatically, just as if a separate compare had been 
performed. 


Understand that this comparison assumes the use of 
"signed" numbers, where the values range from -32768 
through 32767. Where Jump if Less and Jump if 
Greater consider a negative number to be less than a 
positive one, two other forms are available that treat 
all numbers as unsigned. Jump if Above and Jump if 
Below (Ja and Jb respectively), ignore the sign of a 
number, and thus consider all numbers as being within 
the range 0 to 65535. 


The Assembly Tutor 


DOS STRINGS 


When string information is passed to a DOS routine, for 
example when giving the name of a directory to change 
to, the string must end with a CHR§(0). In DOS 
terminology this is called an ASCI//Z string. (Not to 

be confused with a Ctrl-Z which marks the end of a 
file.) DOS does not use string descriptors like BASIC 
does, so this is the only way it can tell when it has 
reached the end. By the same token, when DOS returns 
a string to a calling program, it will mark the end with 
a zero byte. 


If you examine the source code for the QuickPak 
routines that read a directory (ReadFile, Exist, etc.), 
you will notice that the string passed from BASIC is 
first copied into a temporary holding area, expressly 
for the purpose of concatenating a CHR§(0) to the end. 
If this were not done, it would be up to the calling 
program to do so, making the routines more difficult 
to use. 


Rather than duplicate the hundred or so bytes needed 

for the copy in each DOS routine, the QuickPak routines 
instead use a common data area. This temporary memory is 
defined in a file named EXTERNAL.ASM. 


ARRAYS 


The last important topic related to finding and 
manipulating variables is how arrays are stored in 
memory. Since arrays are such an important part of 
programming in any language, an understanding of their 
internal structure is mandatory. Let’s first look at 
integer arrays. 


Page 34 


The Assembly Tutor 


When BASIC encounters the statement DIM X%(100) in 
your program, it allocates a contiguous block of memory 
202 bytes long. (Unless you first set Option Base to 

1, dimensioning an array to 100 means 101 elements.) 
The first two bytes in this block will hold the data 

for X%(0), the next two bytes hold X%(1), and so forth. 
When you ask VARPTR to find X%(0), the address it 
returns is the start of this block of memory. 


The address of subsequent array elements may then be 
easily computed from this base address. Notice, 
however, that with the introduction of QuickBASIC and 
its support for large numeric arrays, extra care is 
required to obtain both the address and the segment 

of those arrays. This will be discussed in a moment. 


String arrays are structured in a similar fashion, only 
for each element that is dimensioned, four bytes are 

set aside. These bytes comprise a table of descriptors, 
and contain the length and address words for each 
element in the string array. But the important point 

is that once you know where one element or descriptor 
lies, it is easy to find all of those that are adjacent. 
Here’s a QuickBASIC example for locating X$(15), based 
on knowing the VARPTR for X§$(0). 


DIM X$(100) 
X$(15) = "Find me" 


V = VARPTR(X$(0)) 
V=aV+t+4* 15 


L = PEEK(V) + 256 * PEEK(V + 1) : PRINT "Length =" L 
Address = PEEK(V + 2) + 256 * PEEK(V + 3) 


PRINT "String = "; 

FOR X = Address TO Address + L - 1 
PRINT CHRS(PEEK(X)); 

NEXT 


ey Pape ae 


aya 


$< eI 


Se eee nae IOP 


Soe Fenn 


ee: 


The Assembly Tutor 


FIXED-LENGTH STRINGS 


With the introduction of QuickBASIC 4.0 and BASCOM 6, 
BASIC now provides fixed-length strings, as well as 
user-definable TYPE variables. Understanding how these 
are passed to an external routine is important, so let’s 
look at the two different ways that BASIC handles them. 


Fixed-length strings do not use a string descriptor, 

which you might think would make them more difficult 

to manipulate from assembly language. But whenever a fixed- 
length string is used as an argument to an assembler routine or 
BASIC subprogram, it is first copied into a temporary "normal" 
string. 


When the routine returns to the main program, BASIC 
then copies the characters back into the original 
fixed-length string. Thus, any routine written in 
assembly language that expects a descriptor will work 
correctly, regardless of the type of string being sent. 


When an element in a TYPE array is being sent to an 
assembly language routine, however, you may also tell 
BASIC to pass the element as is. This means that the 
external routine must know how many bytes comprise 
the variable type, and how those bytes are to be 
interpreted. For example, if the TYPE member is single 
precision, it will be four bytes coded in the IEEE 
numeric format. 


Page 36 


The Assembly Tutor 


Therefore, if you do not want BASIC to create a 
temporary string from one of a fixed length, you must 
first define the string as a TYPE like this: 


TYPE FLlen 
X AS STRING * 20 
END TYPE 


Though this appears to be the same as simply defining 
X as a string with a fixed length of 20, there is one | 
important difference: declaring it as a TYPE lets you | 
tell BASIC to preserve it when calling an external | 
subroutine. The entire TYPE variable will be sent to | 
the routine, as long as the ".X" part that defines it | 
as a String is not used. Here’s an example based on 
the FLen type that was defined above: 


DIM FString AS FLen 'now FString is a TYPE 
FString.X = "This is a test" ‘assign the "X" portion 
CALL Routine(FString) ‘call Routine without ".Xx" 


Here, the address of the first character in the string | 
will be passed to the assembly language routine, as 
opposed to the address of a string descriptor. We have 

told BASIC to call Routine, and to pass it the entire 

FString TYPE, but without interpreting the ".X" string 

component. 


This next example, though, will cause BASIC to first 
convert the string to a normal string before calling 
Routine: 


CALL Routine(FString.X) ‘call Routine with ".Xx" 


Because the ".X" part is included in the call, BASIC 

knows that the fixed-length string portion of the TYPE 

is being passed. BASIC will first make a copy of the . 
string, and then set up a temporary string descriptor | 
that points to the copy. | 


Page 37 


The Assembly Tutor 


Passing an element from a TYPE array uses similar 


logic: 
DIM Array(100) AS FLen 'make the FLen array 
Array(13).X = "Here's another test" ‘assign element 13 
CALL Routine(Array(13)) ‘pass entire element 


If we instead use 
CALL Routine(Array(13).X) ‘pass the string 


then BASIC will see that it is a fixed-length string 
being passed, and will thus make the temporary copy. 


The assembler routine below expects a fixed-length 
string with a length of twenty to be passed to it, and 
it will copy the characters to the upper left corner of 
a color monitor. 


Routine Proc Far 


Push BP 
Mov BP,SP s;access the stack as usual 
Mov SI,{[BP+06] ;get the address for the string 
Mov DI,0 supper left corner of the screen 
Mov AX,0B800h ;the segment for a color monitor 
Mov ES,AX ;put it into ES 
Mov CX,20 ;prepare to move 20 characters 
Cld ;insure that moves are forward 
More: 
Movsb smove a byte to screen memory 
Inc DI ;skip over the attribute byte 
Loop More ; loop until done 
Pop BP 7;return as usual 
Ret 2 
Routine Endp 


Page 38 


The Assembly Tutor 


Notice the use of a new instruction—Movysb. Movs takes 
either a Byte or Word argument (b or w), and copies 
data from the segment and address held in DS:SI to the 
address pointed to by ES:DI. Movs is particularly 
useful because it increments both SI and DI to the next 
higher addresses automatically. If a byte is moved, SI 
and DI are incremented by one. And if a word is being 
moved, they are instead incremented by two. 


You may also tell Movs to instead decrement the 
addresses by setting the direction flag. The direction 
flag is set to go backwards with the instruction Sid, 
and cleared to forward with Cld. 


Two other useful string instructions are Cmps and Scas 
which compare and scan strings respectively. Like 
Movs, these take either a byte or word argument, and 
increment (or decrement) SI and DI automatically. 


The Assembly Tutor 


* Array’s 
Gym 


HUGE ARRAYS AND SEGMENTS 


Also added with QB 4 and BASCOM 6 are what Microsoft 
refers to as huge arrays. A huge array is simply any 
array whose total size exceeds 64K, however there are 
still several restrictions. For example, the number of 
elements is limited to 32767, which precludes having a 
huge integer array. Also, in order to use huge arrays 

in a program, you must start QuickBASIC (or BASCOM) 
with the /AH command line option. 


Page 40 


The Assembly Tutor 


Unlike strings, string arrays, and simple (non-array) 
variables, huge arrays are always located outside of 
BASIC’s normal data segment. This means that an 
assembler routine needs some way to know both the 
address and the segment for an array element that is 
passed to it. This brings up an interesting point. 


Normally, when an assembly language routine begins, it 
can assume that the DS register is holding the correct 
segment for any variables that will be passed to it. 

But this obviously will not be true for huge arrays. 
Further, Microsoft has decided not to document the way 
that arrays are passed, so we can’t simply use empty 


parentheses as is done when calling a BASIC subprogram. 


The approach we have taken with the QuickPak routines 
that operate on an entire array is to specify some 
particular starting element. The routine can then 
assume that all of the subsequent elements lie before 

or after it in memory. But that no longer works with 
the newest versions of BASIC. 


If you call an assembly language routine like this: 
CALL Routine(Array(0)) 


BASIC first makes a copy of the array element into a 
temporary variable in near memory, and the address of 
the copy is then passed to the routine. Thus, while 

the routine can still receive an array element’s value, 
it has know way to determine its true address. And 
without the address, there is no way to get at the rest 
of the array. 


22S a 


The Assembly Tutor 


Since being able to pass an entire array is obviously 
important, Microsoft has now added two new options to 
the CALL command. The SEG key word indicates that 
both the address and the segment are to be passed on the 
stack, as well as telling BASIC not to make a copy of 
the array element. (Notice that in Turbo Basic, all 
variables are passed as segmented addresses. The 
assembler interface to Turbo Basic will be discussed 
separately later.) 


SEG is used with an array element (or any variable for 
that matter) like this: 


CALL Routine(SEG Array(0)) 


BASIC first pushes the segment of the element onto the 
stack, followed by the element’s address. By pushing 
them in this order, the assembler routine can conveniently 
use either LDS (Load DS) or LES (Load ES) to get both 
of them in one operation: 


LES DI, [([BP+06] 


LES moves four bytes in one operation, placing the 
lower word into the named register (DI in this case), 
and the higher word at [BP+08] into ES. LDS works in 
a similar manner, except that the higher word is 
instead moved into DS. 


Because normal (not fixed-length) string arrays are 
always located in near memory, using SEG should not 
be necessary. Indeed, with previous versions of 
QuickBASIC, we could always count on the correct 
address for a string array element to be passed if it 
was specified as part of a call. But beginning with 
QuickBASIC 4, a copy of the string element is passed, 
just like with numeric arrays. As we are about to see, 
a different strategy is required for string arrays. 


Page 42 7 


The Assembly Tutor 


In earlier QuickBASIC and BASCOM versions, VARPTR 


always returned a single-precision offset from the 
current data segment to the named variable. But now, 
VARPTR returns an integer address. Therefore, the 
only way an entire string array may be passed is to use 
VARPTR to determine the array element’s true address, 
and then use BYVAL to send the integer value of that 
address to the assembler routine: 


CALL Routine(BYVAL VARPTR(Array$(Element))) 


Unlike the usual way that BASIC passes a variable by 
pushing its address on the stack, BYVAL (By Value) 
instead pushes the actual data. Of course, the value 
of a VARPTR is what we really wanted to begin with. 


MULTIPLE SEGMENTS 


One final related topic is the way that huge TYPE and 
fixed-length string arrays are organized. Accessing 

all of a huge array can be fairly difficult to begin 
with, because the elements will be spanning multiple 
segments. It is up to the assembler program to know 
which elements are in what segment, and to manually 
adjust those segments as needed. 


With any of the standard BASIC variable types, a single 
array element can never cross a segment boundary. Each 


type of variable will evenly fit into a 64K segment, so 
no matter how large the array is, the start of a new 
segment is always the start of a new element. This 
works out nicely, and causes the first element in an 


array to always Start at the first address in the segment. 


For example, VARPTR(Array%(0)) will always be 0, 
VARPTR(Array%(1)) will be 2, and so forth. 


The Assembly Tutor 


But with a fixed-length string or TYPE array, it could 
be possible for an array element to straddle a segment 
boundary. This would be a real mess, so Microsoft has 
gone to the added effort to prevent this from happening. 


When an array is dimensioned and the size of each 
element is not evenly divisible into 64K, QuickBASIC 
will fudge the first element to start at whatever 
address is necessary to cause the 64K boundary not to 
split an element. The only problem is that in those 
cases, the array can never be larger than two segments. 
Otherwise, the inevitable break would still occur at 
the start of the third segment. 


VARIABLES IN ASSEMBLY LANGUAGE 


Earlier we mentioned using memory space to hold text 
and numeric variables. For assembly language programs 
of any significant size, an understanding of how to do 
this is essential. 


The first step is to set aside the amount of space that 
will be needed with the assembler instructions DB and 
DW. These stand for Define Byte and Define Word 
respectively, and they allocate either one byte of 
storage or two. Notice that these are not commands 
that the 8088 processor will execute, rather they tell 
the assembler to leave room for the data that follows. 


Some examples are shown below: 


DB 12h zone byte -12h 
DB 15 Dup(0) ;fifteen bytes - 0 
DW ? ;two bytes - both 0 


DB 'Test message', 13, 10 ;message, CR, LF 


Page 44 


The Assembly Tutor 
rie ei 


In the first example one byte of memory is allocated, 
and the value 12 Hex is placed there at assembly time. 
The second example illustrates the Dup (duplicate) 
command, and means "set aside fifteen bytes filling 
each with the value zero". 


Filling an area with zeroes can also be accomplished 
with a question mark, and is frequently used when the 
value that will eventually end up there is not known in 
advance. Both do almost the same thing, however using 
"2" implies an unknown, as opposed to an explicit zero. 
You may use whichever method seems more appropriate 
at the time. The last example shows how text may be 
specified, as well as combining values in a single 
statement. 


Since the assembler lets you use names for your data, 
fetching or storing values can be done with the normal 
Mov instruction like this: 


ErrorCode DB ? 
Mov ErrorCode, AL 


This puts the contents of register AL into memory 
location ErrorCode. Getting it back again later is 
just as easy: 


Mov DH, ErrorCode 


Sometimes the assembler needs a little help, though. 


When we move AL or DH in and out of a memory location, 


the assembler knows that we are dealing with a single 
byte. And if we specify BX or SI, the assembler 
understands this to mean two bytes, or one word. But 
when actual numbers are used, the size of the value is 
not always obvious (to the assembler at least). 


Consider the following: 


Mov Counter, 3Ch 


Page 45 


The Assembly Tutor 


Does this mean that we want to put the value 3Ch into 
the byte at location Counter, or the value 003Ch into 

the word at that address? You might think that if 
Counter had been defined with a DB, the assembler would 
be smart enough to understand that we mean byte, but 

it’s not. (Actually, this was finally corrected in MASM 
version 5.0) The assembler does allow the use of labels 

to indicate byte or word, but we’ll use a different 

method here. 


Mov Byte Ptr Counter, 15 


The Ptr (pointer) operator takes either a Byte or Word 
argument, and eliminates any possible misunderstanding. 
In the example above, we are specifying that the 

variable Counter is to be treated as a single byte. Had 
we used Word instead, a 15 would be placed into the byte 
at location Pointer, and a zero would be put into the 
byte immediately following it. (Words are always stored 
with the low-byte before the high-byte in memory.) 


Manipulating strings is just a bit more complicated, 
since we often need to specify the address where the 
characters are stored. For example, to move or print a 
string, its address is usually placed into one of the 


8088’s registers. -- 
Ss he 7 
ieee 


DB RhedeIslaue! I 


Lea Dx Where-To 


There are two ways to get at the memory addresses you 
have set aside—with the Offset assembler operator, 

or by using the Lea (Load Ef fective Address) 8088 
command. If you look at program listings in books and 
magazines you will usually see Offset used, since many 
of those are stand-alone utility programs written in a 
‘COM format. However Offset does not always work in 
the .EXE programs that are added to QuickBASIC. In 
many instances both do the same thing, but for these 
examples we will use Lea. 


Page 46 


The Assembly Tutor 


SS ES 


The syntax for Lea is very similar to that used for 
Mov, except you are telling the 8088 to put the address 
of the specified data into a register, rather than the 
data itself. As an example we’ll set up a message, and 
then use DOS service 9 to print it. Service 9 expects 
that the address of the string to be printed will be in 
DX when it is called. 


ErrorMsg DB 'Put a disk in the drive, silly!$' 
Lea DX, Error_Msg 
Mov AH, 9 
Int 21h 


Notice the dollar sign placed at the end of the 
message. Even though almost all of the other DOS 
services use a zero byte to indicate the end of a 
string, this particular service uses a dollar sign. As 
you can see, even Microsoft makes silly mistakes 
sometimes! Other occasions that you will need to use 
Lea are when moving or comparing strings. 


ADDING TO BASIC 


Now let’s look at some of the ways an assembly language 
program can be added to BASIC. The manuals that come 
with the BASIC interpreter suggest several possible 
approaches. 


One is to BLOAD the assembled program from a disk 
file, though that doesn’t always work with early 

versions of QuickBASIC. Another method is to read the 
bytes of assembler code from DATA statements, and place 
those bytes into either a string variable or an integer 
array. 


eee sete een 


The Assembly Tutor 


"A little of this, and 
a little of that. . ." 


That’s better than using BLOAD, but it takes at least 
twice as much space away from BASIC, since each byte 
appears in the data statement as well as in the array 

or string. But for the sake of completeness, we’ll 

look at how it’s done. 


Here’s the PrtSc routine in a string variable: 
Code$ = SPACE$(3) 


FOR X = 1103 
READ Dat$ 
MID$(Code$, X, 1) = CHRS(VAL("&H" + Dat$)) 
NEXT 


X = VARPTR(Code$) 

Addr% = PEEK(X + 2) + 256 * PEEK(X + 3) 
CALL ABSOLUTE(Addr%) 

END 

DATA CD, 05, CB 


Page 48 


The Assembly Tutor 


SSS I gS 


Not a pretty story, is it? First the code bytes are 
read and assigned into a string. Then the address of 
the string descriptor must be found. Finally the 
beginning of the string data must be located and 
called. There’s got to be a better way—and there is. 


The compiler allows us to create external routines that 
can simply be called by name. Contrast the above mess 
to the QuickPak syntax for calling PrtSc: 


CALL PrtSc 


Ah, now we’re getting somewhere. Only we must use 

the IBM or Microsoft Macro Assembler if this method is 
to be successful. The Macro Assembler allows us to 

give a name to an assembly language program, which the 
QuickBASIC compiler can access when the program is 
linked and run. The following section will explain how 
to do this. We’ll stick with PrtSc as the example, 

since it’s such a simple program. 


Code Segment Byte Public 'Code' 
Assume CS:Code 
Public PrtSc 

PrtSc Proc Far 


Begin: Int 5 scall BIOS interrupt 5 
Ret ;return to BASIC 

PrtSc  Endp 

Code Ends 


The Assembly Tutor 


The first three lines tell the assembler that the code 
will end up in the Code Segment, and to make the name 
"PrtSc" public. The fourth line defines the start of a 
procedure. The actual code occupies the next two 

lines. Of course, we must tell the assembler where the 
procedure ends, which in this case is also the end of 
the code segment. 


Had we included several procedures within the block of 
code, each procedure would show a start and end, but 
there would only be a single Code section. (See the 
source code for SYSTIME.ASM on the QuickPak disk 
for an example of two procedures within one block 

of code.) 


Even if you don’t completely understand the reason for 
all of this (I wonder myself sometimes), all you really 
have to do is copy the declaration lines as shown, and 
it will all work out just fine. 


One important note on procedures, though, is the 
difference between a Far procedure and a Near one. 

Any external routine that is called from BASIC will be 

a Far procedure. This means that the procedure does 

not necessarily have to be within the same code segment 
as the main BASIC program. (Which allows the combined 
programs to exceed the usual 64K limit.) 


When BASIC executes your CALL command, it uses a two- 
word address as the location to jump to. One of the 
words contains a segment, and the other an address 
within that segment. Conversely, when your program 
finally returns, the 8088 must know to remove two words 
from the stack—a segment and an address—to find where 
to return to in the calling BASIC program. 


Page 50 


The Assembly Tutor 


a 


address that is only one word long. And when the i 
procedure returns, only a single word would be popped | 
from the stack. Again, the assembler does the bulk of 
the dirty work for you. You just have to remember to | 
indicate Far. i 


| 
| 
A near procedure, on the other hand, would call an 
| 
| 
| 


SIMPLIFIED SEGMENTATION 


Fortunately, Microsoft recognized what a pain dealing 
with segments and procedures can be, and has now 
introduced a scheme called Simplified Segmentation 
beginning with MASM version 5. Rather than require 
the programmer to define the various code and data 
segments, all that is needed are a few simple key words. 


The first is .Model Medium, which tells MASM that the 
procedures that follow will be Far. Used in conjunction 
with .Code and .Data, .Model Medium causes any data 
you define to be placed in a group named DGROUP. 


By using the name DGROUP, the linker will automatically 
gather up all of your DB’s and DW’s, and put them into 
the same data segment that BASIC uses. While this has 
the disadvantage of impinging on BASIC’s string space, 

it also means that on entry to the routine, DS will 

always hold the correct segment. 


Also introduced with MASM 5.0 is a series of useful | 
macros that simplify creating procedures and accessing 
BASIC variables. These are contained in the file a 
MIXED.INC on the MASM distribution disk, and three 
of the macros are used throughout QuickPak. 


The Assembly Tutor 


HProc defines the start of a procedure, and also lets 
you declare the parameters that the assembler routine 
will be accessing. The syntax for HProc is as follows: 


HProc routinename, Var1:Ptr, Var2:Ptr .. . 


| Besides establishing routinename as a procedure, 
HProc also declares it to be public so a BASIC program 
can call it by name. The variables Varl, Var2 and so 
forth are simply the incoming parameters, listed in the 
order in which they will be given in the calling 
program. 


You may optionally tell HProc that one or more 
registers need to be saved with the <Uses> statement. 

In a BASIC program, only two registers must be saved— 
DS and BP. HProc will save BP automatically, but if 
your program will be changing DS, you should invoke 
HProc like this: 


HProc routinename, <Uses DS>, Var1:Ptr, Var2:Ptr.. . 


4 a 
tap AT 
OZ) - 


-Z iy 


Jf, === E 
oy = Hip a 
2, 4, Vy : = = SSS 
|_>S= _—& 2 36 
= Oe 2 ae 
Bere ns ors We — aa 2a S| SS 
SSS Se FF 
oe O), Bee (eee 
| sa ee  ————_ ——— } 
renee —_— = = |) \ 
@ 


Page 52 


aR Cid 


The Assembly Tutor 


SS SE ee ee 


Once a procedure has been defined with HProc, the code 
to Push BP, and move BP into SP will be generated 
automatically. Even better, the macro will keep track 
of the placement of the variables on the stack. That 

is, rather than having to use: 


Mov SI, [BP+08) 
HProc lets you write: 
Mov SI,Vari1 


When many parameters are being passed, this can help 
eliminate an important source of bugs. Moreover, while 
a program is under development you may find yourself 
adding or deleting parameters. By referencing them by 
name, you won’t have to go back over the code searching 
for all occurrences of [BP+08] changing them to [BP+10] 
or whatever. 


One other point worth noting is that a bug in the HProc 
macro generates a warning error if you try to get at 
two parameters at once with LDS or LES. Even though 
a statement such as: 


LES DI,Var2 i 


should be perfectly legal, MASM will tell you that an 
illegal size for the item was used. Fortunately, this | 
is just a warning which you can ignore—the resultant 
code will still work just fine. | 

| 


The second important macro is HRet, which simply 
creates a "Ret n" instruction, where "n" is the correct 
value for the number of variables that are being 
passed. 


The final macro is HEndp, which serves the same 
purpose as the usual Endp, except you are not required 
to give the procedure’s name again. 


i nN 


Page 53 


The Assembly Tutor 


Caen erence eect ec a ea alt ieee 


ASSEMBLER FUNCTIONS 


Assembler functions are new with QuickBASIC 4.0 and 
BASCOM 6, and allow an external routine to return a 
value directly. We have used this feature extensively 
in QuickPak Professional, in those cases where 
returning a value is appropriate. For example, the 
Monitor function reports the type of monitor that 1s 
currently active, without requiring a variable to be 
passed. 


Creating an assembler function is quite easy, however 
it must be declared in the BASIC program before it 

is referenced. After all, if your program requests the 
current monitor type like this: 


X = Monitor 


how could BASIC know whether you mean the variable 
Monitor, or a function with that name? 


To return an integer value, all an assembler function 
needs to do is to place the outgoing value in the AX 
register before returning to BASIC. If the function is 
to return a long integer (or single precision value), 

both DX and AX will be used. In this case, DX always 
holds the higher word, while AX holds the lower one 
(DX:AX). An example of this may be found in the 
source listing for the FileSize function. 


Page 54 


The Assembly Tutor 


One advantage to using assembler functions (besides 
requiring one less parameter to be passed) is that any 
type of variable may be used. For example, had Monitor 
been written as a callable routine that expects an 
integer variable, then an integer would have to be 

used. But with a function (whether written in assembly 
language or BASIC), any type of variable may be used, 
and BASIC will figure out what to do: 


X# = Monitor% 


The example above is perfectly legal, and any necessary 
type conversion will be performed automatically. 


Double precision values may also be returned, however 
in that case the address of the data must be placed 

in AX. Since a double precision variable requires 
eight bytes, this is a sensible way to do things. Of 
course, those eight bytes must be in the normal 
DGROUP data segment, so BASIC will know where AX 
is really pointing to. 


Finally, an assembler function may return a string 
variable, but this requires a bit more work. As witha 
double precision function, an address will be left in 
AX before returning to BASIC. However in this case, 
AX will instead point to a string descriptor which the 
function must create. You can see how a String 
function is written by examining the source listing for 
NUM2DATE.ASM. 


The Assembly Tutor 


TURBO BASIC 


Most of the information we have covered so far will 
apply equally to QuickBASIC, BASCOM, and Turbo Basic. 
However, there are a few important differences in the 
way Turbo Basic handles assembly language routines. 


Perhaps most important is that Turbo Basic (as of 
version 1.1) does not use LINK. Rather, the routines 
will be written in the form of a .COM file, which is 
then included directly into the BASIC source code. 
Eliminating the link step has the decided disadvantage 
of not allowing an assembler routine to be called by 
name. Even worse, there is no facility to include data 
items for local variables. 


Where the Microsoft BASIC compilers let you use DB and 
DW freely, and then refer to those memory locations by 
name, one or more devious tricks must be employed with 
Turbo Basic. One possible approach is to create space 

on the stack, and then access the variables with [BP+02], 
[BP+10], and so forth. However, we use a slightly 
different method in the Turbo version of QuickPak. 


Page 56 


wii 


The Assembly Tutor 


a 


Whenever an assembler Call is executed, the address to 
return to is always placed on the stack. When the 
procedure finishes, the microprocessor knows to remove 
the address from the stack, and then jump there. That 
address will always be the next instruction to execute, 
much like the way a GOSUB is handled in BASIC. We 
can use this behavior to our advantage by executing a 
Call just before creating one or more DB or DW data 
statements. For example: 


Routine Proc Far 


CALL Continue ;skip over the data 

Vari DB ? 

Var2 DW ? 

Continue: 
Pop BX sretrieve the address for Var1 
Mov AX,3 s;put a 3 into AX 


Mov CS:[(BX+01],AX ;move the 3 into Var2 


When BX is popped from the stack it holds the address 
of the next memory location, which in this case is | 
Varl. Notice that any variables defined this way will 
always be in the code segment. Thus, any references to 
those variables will always require a CS: segment over- 
ride prefix. A complete example of storing data in 
Turbo Basic’s code segment this way may be found in 
the source listing for the APrint routine. 


The Assembly Tutor 


Another very important difference with Turbo Basic is 
the manner in which strings are stored. In Microsoft 
BASIC, all normal (not fixed-length) strings and their 
descriptors are always stored in the same 64K data 
segment. Simply dimensioning a string array to 3000 
steals 12K of string memory just for the descriptors. 


Turbo Basic instead keeps strings in their own segment, 
and the descriptors in yet another segment. While this 
method provides more memory for strings, it makes them 


much more difficult to access. 


When an assembler routine is called with a string 
argument, the segmented address of the string 
descriptor is placed on the stack. The easiest way to 
retrieve that address is with LDS (or LES) as shown 


below: 
LDS SI, [BP+06) ;get the descriptor address 
Mov CX, [SI] ;put the string length into CX 
Mov SI, [SI+02] z;and the address of the data in SI 


The only problem with this example is that while we can 
freely get at the string descriptor, we still don’t know 
where the string’s data is located. 


All strings in a Turbo Basic program are held in one 
segment. That segment may be found by examining the 
very first two bytes of Turbo’s default data segment. 
Therefore, the first thing any assembler routine that 
will access strings must do is to get those bytes before 


changing DS, as shown below. 


Mov ES,DS: [00] ;put the string data segment into ES 
LDS SI, [BP+06} ;get descriptor address into DS:SI 
Mov CX, {[SI] ;the length usually goes in CX 

And CX,01111111b ;insure that the hi-bit is not set 
Mov SI, [S1+02] z;and the address in SI 

Push ES ;move ES into DS through the stack 
Pop DS now DS:SI points to the first 


; character in the string 


en ARREST I Se a a 


Page 58 


The Assembly Tutor 


Notice how the high bit in CX is forced off, just in 
case it was set by Turbo Basic. Even though a BASIC 
string can never be longer than 32767 characters, 
Turbo Basic apparently uses that bit for some internal 
purpose. If the bit is on and you fail to clear it, the 
assembler routine might fail because it thinks the 
string is much longer than it really is. 


SOURCE CODE 


The final important topic to discuss is how to create 
the source code for the assembler. Any ASCII text 
editor will work, though I prefer to use Borland’s 
SideKick. This way, when the assembler catches an 
error (and there’s always an error), a single key 

press puts you back into the editor to fix it. The 
assembler will tell you which lines have the errors, 
and SideKick’s status line helps you get there quickly. 
Then a second key press will re-save the text, ready 
to assemble again. 


By the way, one error in a source file may confuse the 
assembler, causing it to think that other subsequent 
lines are also in error—even when they’re not. For 
example, if you fail to declare the start of a procedure, 
any subsequent references to that procedure will be 
reported as an error. This can also happen when 
compiling QuickBASIC programs for the same reason. 


The Assembly Tutor 


SUMMING UP 


As with any new endeavor, the only way to become 
proficient is to roll up your sleeves and start doing 

it. And so it is with assembly language. Feel free to 
experiment with all of the QuickPak routines, making 
changes to see what happens. (On a copy of the disk, 
of course!) Besides the BASIC enhancements, several 
stand-alone assembler programs are included on the 
QuickPak program disk for you to study and enjoy. 


NLQ and Tiny will send the control codes to an 
IBM/Epson printer to enable near letter quality and 

tiny printing respectively. LptSwap allows swapping 
LPT1: and LPT2: to accommodate programs that can talk 
to only one printer. FormFeed sends a form feed to 
your printer at any time by pressing Alt-PrtSc, and 
illustrates how to create programs that remain resident 
in memory. 


We sincerely hope that you have found this tutorial to 
be useful and enlightening. Your feedback and 
suggestions will be greatly appreciated, and can only 
help to make subsequent products and tutorials more 
valuable to all. 


Now 
Lets Get 
own TO 
Ss 

Busine? 


| Page 60 


TS 


The Assembly Tutor 


RECOMMENDED READING 


There are many books that can help you learn more 
about assembly language. While the ones listed below 
represent only a few, we have found them to be quite 
valuable. 


Programmer’s Guide to the IBM PC by Peter Norton 
Assembly Language Primer by Robert Lafore 
Assembly Language Programming by David Bradley 
Advanced MS DOS by Ray Duncan 

Programmer’s Problem Solver by Robert Jourdain 


Norton’s book is one of the most useful references 
available for learning how to access the PC’s DOS and 
BIOS services, though it lacks many recent topics such 
as EGA and VGA BIOS services. 


The Assembly Language Primer is oriented towards 
beginners, and therefore never quite gets to the more 
advanced topics. However, it is particularly clear and 
well illustrated. 


David Bradley was one of the original developers of 
the PC hardware, and is obviously quite an authority. 
His book is very well written, though the topics 
quickly become advanced. If you are serious about 
learning assembly language, though, you can expect to 
grow into this book. 


Ray Duncan is well known in the PC community for 
his expertise, and this book shows why. Though it is | 
billed as being for assembler and C programmers, it | 
is equally relevant for BASIC programmers. 


The Assembly Tutor 


The Programmer’s Problem Solver is interesting in that 
almost all of the topics covered are shown at three 
different levels. That is, most solutions are given 
first in BASIC, then in assembly language, and when 
appropriate at the hardware level as well. 


And of course, PC Magazine, PC Resource, and 
the Programmer's Journal continue to be vital 
sources of information for both beginning and 
professional programmers. 


Horland’s 
Curle 
Cechuix 


Drawings by Jay Munro 
Entire contents Copyright (c) 1988 by Ethan Winer & Jay Munro 


Page 62 


The Assembly Tutor 


nT 


This page was supposed to be left blank 
intentionally, but apparently someone 
in the software industry decided 
that blank pages must say: 


This Page Intentionally Blank 


This page isn't really blank. Of is it? 


Crescent Software, Inc. 
32 Seventy Acres, West Redding, Ct 06896 
(203) 846-2500 


